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Not since the development of the objective paper and pencil test early in the century 
has an assessment method hit the American educational scene with such force as has 
performance assessment methodology in the 1 990s. Performance assessment relies on 
teacher observation and professional judgment to draw inferences about student 
achievement. The reasons for the intense interest in an assessment methodology can 
be summarized as follows: 

During the 1980s important new curriculum research and development efforts at school 
district, state, national and university levels began to provide new insights into the 
complexity of some of our most valued achievement targets. We came to understand 
the multidimensionality of what it means to be a proficient reader, writer, and math or 
science problem solver, for example. With these and other enhanced visions of the 
complex nature of the meaning of academic success came a sense of the insufficiency 
of the traditional multiple choice test. Educators began to embrace the reality that some 
targets, like complex reasoning, skill demonstration and product development, 
"require"--don't merely permit--the use of subjective, judgmental means of assessment. 
One simply cannot assess the ability to write well, communicate effectively in a second 
language, work cooperatively on a team, and complete science laboratory work in a 
quality manner using the traditional selected response modes of assessment. 

As a result, we have witnessed a virtual stampede of teachers, administrators and 
educational policy makers to embrace performance assessment. In short, educators 
have become as obsessed with performance assessment in the 1 990s as we were with 
the multiple choice tests for 60 years. Warnings from the assessment community 
(Dunbar, Kortez, and Hoover, 1991) about the potential dangers of invalidity and 
unreliability of carelessly developed subjective assessments not only have often gone 
unheeded, but by and large they have gone unheard. 

Now that we are a decade into the performance assessment movement, however, some 
of those quality control lessons have begun to take hold. Assessment specialists have 
begun to articulate in terms that practitioners can understand the rules of evidence for 
the development and use of high quality performance assessments (e.g. Messick, 
1994). As a result, we are well into a national program of research and development 
that builds upon an ever clearer vision of the critical elements of sound assessments to 
produce ever better assessments (Wiggins, 1993). 

The purpose of this digest is to provide a summary of those attributes of sound 
assessments and the rules of evidence for using them well. The various ways the 
reader might take advantage of this information also are detailed. 

THE BASIC METHODOLOGY 

The basic ingredients of a performance assessment may be described in three parts 
(Stiggins, 1984): (1) the specification of a performance to be evaluated, (2) the 
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development of exercises or tasks used to elicit that performance and (3) the design of 
a scoring and recording scheme for results. Each contains sub-elements within it. 
For example, in defining the performance to be evaluated, assessment developers must 
decide where or how evidence of academic proficiency will manifest itself. Is the 
examinee to demonstrate the ability to reason effectively, carry out other skills 
proficiently or create a tangible product? Next, the developer must analyze skills or 
products to identify performance criteria upon which to judge achievement. This 
requires the identification of the critical elements of performance that come together to 
make it sound or effective. In addition, performance assessors must define each 
criterion and articulate the range of achievement that any particular examinee's work 
might reflect, from outstanding to very poor performance. And finally, users can 
contribute immensely to student academic development by finding examples of student 
achievement that illustrate those different levels of proficiency. 

Once performance is defined, strategies must be devised for sampling student work so 
skills or products can be observed and evaluated. Examinees might be presented with 
structured exercises to which they must respond. Or the examiner might unobtrusively 
or opportunistically watch performers during naturally occurring classroom work in order 
to derive evidence of proficiency. When structured exercises are used to elicit 
performance, they must spell out a clear and complete set of performance 
responsibilities for examinees. In addition, the examiner must include in the assessment 
enough exercises to sample the array of performance possibilities in a representative 
manner that is large enough to lead to confident generalizations about examinee 
proficiency. 

And finally, once the desired performance is described and exercises have been 
devised, procedures must be spelled out for making and recording judgments. These 
scoring schemes, sometimes called rubrics, help the evaluator translate judgments of 
proficiency into ratings. The assessment developer must select the level of detail to be 
reflected in records, the method of recording results, and who will be the observer and 
rater of performance. 

SOUND PERFORMANCE CRITERIA 



Ouellmalz (1 993) offers a set of specific guidelines for the development of quality 
performance criteria. These reflect important aspects of skill demonstration that judges 
are to look for and evaluate--they represent important attributes of quality products. 
They are devised through a thoughtful analysis of samples of high quality performance 
and comparison to samples of inferior performance. Out of this comparison come an 
understanding of the keys to academic success in the context for which the assessment 
is designed. Ouellmalz advises us that criteria should: be significant, specifying 
important performance components; represent standards that would apply naturally to 
determine the quality of performance when it typically occurs; be generalizable--that is, 
applicable to a class or tasks--not apply to only one task appropriate continuum from 
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low-to high-level achievement; communicate clearly to and be able to understood by all 
involved in the performance assessment process, including teachers, students, parents 
and community; hold the promise of communicating information about performance 
quality that provides a basis for the improvement of that performance, (p. 320) 
The attributes of quality performance that form the basis of judgment criteria should be 
couched in the best current thinking about the keys to academic success as defined in 
the professional literature of the discipline in question. 

SOUND PERFORMANCE EXERCISES 



Baron (1993) provides guidance in the development of sound exercises. These spell out 
the achievement to be demonstrated by the examinee, the conditions under which the 
demonstrations will take place and the criteria that will serve as the basis for evaluation 
of performance. In short, they focus the examinee sharply on the task at hand. Baron 
advises that these questions be used to determine exercise quality: when students 
prepare for my assessment tasks and I structure my curriculum and pedagogy to enable 
them to be successful on these tasks, do I feel assured that they will be making 
progress toward becoming genuine or authentic readers, mathematicians, writers, 
historians, problem solvers, etc.; do my tasks clearly communicate my standards and 
expectations to my students; are some of my tasks rich and integrative, requiring 
students to make connections and forge relationships among various aspects of the 
curriculum; do some of my tasks require that my students sustain their efforts over a 
period of time (perhaps even an entire term!) to succeed; do my tasks require 
self-assessment and reflection on the part of students; are my tasks likely to have 
personal meaning and value to my students; and do some of my tasks provide problems 
that are situated in real-world contexts and are they appropriate for the age group 
solving them? 

EFFECTIVE SCORING AND RECORDING 



The basis of the effective application of performance assessment methodology is 
thoroughly trained raters relying on sound performance criteria to observe and evaluate 
student responses to quality exercises (Stiggins, 1994). It is rarely the case that raters 
can automatically judge student performance merely as a matter of their prior 
professional development. Training~or at least a systematic verification of qualifications 
to rate performance-is essential in all contexts in which quality assessment results are 
the goal. 

One test of the quality of ratings is interrater agreement. A high level of degree of 
agreement is indicative of objectivity of ratings. Another test of quality is consistency in 
a particular rater's judgments over time. Ratings should not drift but rather should 
remain anchored to carefully defined points on the scoring scale. A third index of 
performance rating quality is consistency in ratings across exercises intended to be 
reflective of the same performance~an index of internal consistency. When these 
standards are met, it becomes possible to take advantage of the immense power of this 
kind of assessment to muster concrete evidence of improvement in student 
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performance over time. 

There are three design decisions to be made by the performance assessment 
developer with respect to scoring schemes: the level of specificity of scoring, the 
selection of the record keeping method, and the identification of the rater. Scores can 
be holistic or analytical, considering criteria together as a whole or separately. The 
choice is a function of the assessment purpose. Purposes like diagnosing weaknesses 
in student performance that require a high resolution microscope require analytical 
scoring. 

Recording system alternatives include checklists of attributes present or absent in 
performance, rating scales reflecting a range in performance quality, anecdotal records 
that describe performance or mental record keeping. Each offers advantages and 
disadvantages depending on the specific assessment context. 

Raters of performance can include the teacher, another expert, students as evaluators 
of each other's performance or students as evaluators of their own performance. Again, 
the rater of choice is a function of context. However, it has become clear that 
performance assessment represents a powerful teaching tool when students play roles 
in devising criteria, learning to apply those criteria, devising exercises, and using 
assessment results to plan for the improvement of their own performance-all under the 
leadership of their teacher. 

PERFORMANCE ASSESSMENT IN THE 
GUIDANCE CONTEXT 



The ongoing guidance and counseling function in the school could bring student service 
personnel into contact with performance assessment methodology in three important 
ways. Very often, other education professionals regard counselors as sources of 
expertise in assessment and may bring request for opinions about the value of this 
methodology, or they may ask for help in the design and development of performance 
assessments. 

Or counselors might be invited to serve as raters of student performance in specific 
academic disciplines. If and when such opportunities arise, thorough training is 
essential for all who are to serve in this capacity. If the teachers issuing this invitation 
have developed or gleaned from their professional literature refined visions of the 
meaning of academic success, have transformed them into quality criteria and provide 
quality training for all who are to observe and evaluate student performance, this can be 
a very rewarding professional experience. If these standards are not met, it is wise to 
urge (and perhaps help with) a redevelopment of the assessment. The third and final 
contact for counselors is as an evaluator of students within the context of the guidance 
function, observing and judging academic or affective student characteristics. In this 
case, the counselor will be both the developer and user of the assessment and must 
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know how to adhere to the above mentioned standards of assessment quality. 

For all of these reasons, it is advisable for school guidance and counseling personnel to 
understand when this methodology is likely to be useful and when it is not and how to 
design and develop sound performance assessments. 
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